Re: [GENERAL] Determining if "in a text set" - Mailing list pgsql-sql
From | Herouth Maoz |
---|---|
Subject | Re: [GENERAL] Determining if "in a text set" |
Date | |
Msg-id | l0311070cb3240c3166d7@[147.233.148.142] Whole thread Raw |
Responses |
Re: [SQL] Re: [GENERAL] Determining if "in a text set"
Re: [GENERAL] Determining if "in a text set" |
List | pgsql-sql |
I am redirecting the discussion to the SQL list, as it is more appropriate than the GENERAL list. At 06:41 +0200 on 27/03/1999, Richi Plana wrote: > I have a table/class with a text field which contains a single word > (possibly trailed by whitespaces). Given a phrase (a text literal composed > of words separated by one or more whitespaces), what's the best way to > SELECT those tuples with fields containing instances that match exactly > one of the words in the given string? > > Example: > > Table: > Tuple# ... FieldN > 1 ... 'MON' > 2 ... 'TUE' > 3 ... 'THURS' > 4 ... 'THU' > 5 ... 'FRI' > > Given phrase: > 'MON TUE WED THURS' > Tuple 1 will match > Tuple 2 will match > Tuple 3 will match > Tuple 4 will NOT match > Tuple 5 will NOT match > > Right now, I'm extracting (using strtok(), 8^P) each word from the given > phrase and using "SELECT * FROM {class} where FieldN LIKE '%{word}%'", but > it's terribly slow, relies on strtok() to parse words, and Tuple 4 above > will match. First, do without the strtok. You could achieve the same result (that also matches tuple 4) with: SELECT * FROM {class} WHERE '{phrase}' LIKE '%' || ( rtrim( FieldN ) || '%' ); Hehe... the strange expression simply cuts off spaces from the right side of the field, and concatenates % signs on its left and right. So you match the phrase against the keyword, not the other way around. It is more readable if you use regular expression rather than 'like': SELECT * FROM {class} WHERE '{phrase}' ~ rtrim( FieldN ); Now, turning attention to eliminating the fourth tuple from matching: I'm not an expert on regular expressions. I think postgres does not support perl's concept of word boundaries. But if the phrase is supposed to contain just spaces (ascii 32, if you will) as word separators, I'd match against the field with a space on the left and nothing on the right, a space on the right and nothing on the left, spaces on both sides, or standing alone: SELECT * FROM {class} WHERE '{phrase}' ~ ' ' || ( rtrim( FieldN ) || '$' ) OR '{phrase}' ~ '^' || ( rtrim( FieldN ) ||' ' ) OR '{phrase}' ~ ' ' || ( rtrim( FieldN ) || ' ' ) OR '{phrase}' = rtrim( FieldN ); The equivalent with 'like' is: SELECT * FROM {class} WHERE '{phrase}' LIKE '% ' || rtrim( FieldN ) OR '{phrase}' LIKE rtrim( FieldN ) || ' %' OR'{phrase}' LIKE '% ' || ( rtrim( FieldN ) || ' %' ) OR '{phrase}' = rtrim( FieldN ); If the spaces between the words in your phrase are not just plain spaces but can also be tabs etc, you will have to use the regexp version, and write something like '[ \t\r\f\n]' - though I'm less than sure that postgres's regular expressions support these. Can anyone tell us which regexp definition postgres uses? Herouth -- Herouth Maoz, Internet developer. Open University of Israel - Telem project http://telem.openu.ac.il/~herutma